-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow batching the output of a join #2310
Conversation
Signed-off-by: Robert (Bobby) Evans <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try to add in some more java docs too.
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/execution/GpuHashJoin.scala
Outdated
Show resolved
Hide resolved
I have been adding in spilling for the gather maps which let me push things a bit further and found a bug in the gather implementation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First pass @revans2
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/execution/GpuHashJoin.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/execution/GpuHashJoin.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/execution/GpuHashJoin.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/execution/GpuHashJoin.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/execution/GpuHashJoin.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/execution/GpuHashJoin.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/execution/GpuHashJoin.scala
Outdated
Show resolved
Hide resolved
...park300/src/main/scala/com/nvidia/spark/rapids/shims/spark300/GpuBroadcastHashJoinExec.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/execution/GpuHashJoin.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/execution/GpuHashJoin.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/execution/GpuHashJoin.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/execution/GpuHashJoin.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/execution/GpuHashJoin.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/execution/GpuHashJoin.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/execution/GpuHashJoin.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/execution/GpuHashJoin.scala
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/execution/GpuHashJoin.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuBoundAttribute.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/execution/GpuHashJoin.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/execution/GpuHashJoin.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/execution/GpuHashJoin.scala
Show resolved
Hide resolved
I think I have addressed all of the review comments. I put together my own quick hack for rapidsai/cudf#8121 and it is not enough to be able to run q72. Even at a batch size of 26m and 200 partitions it took over 6 mins to finish one of the join tasks our of 200 and failed on the next one. We are going to have to really think about what we want to try and do to support query 72. But all of the others run with reasonable configurations. |
...park300/src/main/scala/com/nvidia/spark/rapids/shims/spark300/GpuBroadcastHashJoinExec.scala
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuShuffledHashJoinBase.scala
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/execution/GpuHashJoin.scala
Show resolved
Hide resolved
Tests on not passing with struct joins need to do some more debugging
build |
I upmerged and had to update the code for the new struct join support. A good thing too because it exposed a bug in my filtering code. It would only have been a performance regression before the struct code, but afterwards it became an error. This should be all ready to go now. The dependency is merged. |
Signed-off-by: Robert (Bobby) Evans <[email protected]>
Signed-off-by: Robert (Bobby) Evans <[email protected]>
This is the first step for out of core join. This at least partially addresses #20
This depends on rapidsai/cudf#8118 to go in first.
For most cases that I have tested this is strictly better than what was before. If the output of the join fits in the output batch size then the join will happen just like it does today. If the output is larger than that we now can output it in multiple batches. The problem that I have found is that the gather map is not spillable and after a single batch is output the GPU Semaphore is released. This means that for contrived joins that explode evenly, each active task will have a potentially large gather map in memory. I think I can make it spillable without a lot of work. If I can then I might just do it. But I also want to spend some time running benchmarks to see if this can help fix some of the exploding join issues have have seen there.